A write barrier is a small piece of code that executes whenever a reference (pointer) is stored into a field or array element, keeping the garbage collector's metadata synchronized with program memory to enable efficient generational and concurrent collection.
In garbage collection, a write barrier is not about memory ordering (like in concurrent programming), but rather a notification mechanism. It's a snippet of code inserted before or after every pointer write operation, acting as the garbage collector's "eyes and ears" during program execution. Without it, the collector would lose track of how objects reference each other, potentially leading to incorrect memory reclamation . Write barriers are essential for modern collection strategies like generational GC (tracking old-to-young pointers) and concurrent marking (maintaining correctness while the program runs) .
Tracking Cross-Generation References: In generational GC, the barrier detects when an old generation object is modified to point to a young generation object. These references must be recorded so the young collector knows about them without scanning the entire old heap .
Maintaining the Tricolor Invariant: During concurrent marking, the barrier ensures that the collector's marking state remains consistent even as the program modifies object graphs. It marks newly created references to prevent live objects from being incorrectly collected .
Card Marking: The barrier marks fixed-size memory regions (cards) as "dirty" when references within them change. During collection, only dirty cards need to be scanned rather than the entire heap .
Remembered Set Maintenance: For collectors that track inter-region references precisely, the barrier records the exact locations of modified pointers in a remembered set data structure .
Write barriers are among the hottest code paths in a runtime—they execute on every single reference write, which happens constantly in typical programs. This makes them critical targets for optimization. Even small improvements in barrier efficiency can yield significant overall performance gains . For example, a generational GC implementation measured a moderate 8% average overhead from its write barrier, which was acceptable given the substantial improvements in total instructions (16-24% faster) and mutator utilization (10-12% higher) compared to non-generational collectors .
Dijkstra-Style Insertion Barrier: Triggers when a reference is stored, marking the newly referenced object. This ensures any object newly reachable via the write is marked, preventing loss under certain concurrent marking schemes. Go language used this before 1.8 .
Yuasa-Style Deletion Barrier: Triggers when a reference is overwritten, marking the object that was previously referenced. This protects objects that become unreachable only through the modification .
Generational Barrier: Checks if the store creates an old-to-young reference. If so, it records the source object (e.g., by setting a "remembered" bit and adding to a remembered set) .
Incremental Marking Barrier: During concurrent marking, checks if a pointer is written into an already-marked object and ensures the target gets marked .
Cheap Writes, Expensive Collections: A minimal barrier reduces per-write overhead but forces the GC to scan more memory during pauses, increasing latency .
Smarter Writes, Faster Collections: A more sophisticated barrier does extra work on each write to precisely track references. This reduces scanning during pauses, improving throughput and tail latency for large heaps .
Example Impact: A 5-10% reduction in dirty cards from a smarter barrier can translate into milliseconds less pause time per collection, multiplied across thousands of collections daily—a huge win for server workloads .
Specialized Variants: Modern runtimes like .NET CoreCLR use multiple barrier variants (e.g., 10 specialized versions for Arm64) tuned for different GC modes, switching between them dynamically to balance write and collection costs .
Throughput Mode: Java's G1 collector offers a simplified barrier when concurrent refinement is disabled (-XX:-G1UseConcRefinement), improving throughput at the cost of potentially longer pauses for latency-insensitive workloads .
The design of write barriers involves constant optimization at the assembly level. Because they run so frequently, they are often hand-coded in assembly and carefully tuned for specific hardware architectures. For instance, .NET CoreCLR's Arm64 barrier was redesigned to use multiple specialized versions, and V8's TurboFan compiler includes optimizations to avoid unnecessary barriers (e.g., when storing to root set objects) and to move the barrier code out of the hot path .
Remembered Set Completeness: For generational GC, it's critical that every old-to-young pointer is recorded. Implementations include extra sanity checks (e.g., verifying remembered set coverage during full collections) to detect missing barriers .
Concurrent Marking Safety: Without barriers, concurrent mutator modifications could hide live objects from the marker, leading to premature reclamation. The barrier prevents this by ensuring the tricolor invariant is maintained .
Data Race Handling: In concurrent collectors, barriers must work correctly even with relaxed memory ordering. For example, the Dart VM's barrier carefully handles races between mutator writes and concurrent marker reads, ensuring correctness without expensive synchronization .
In summary, write barriers are an essential mechanism that enables high-performance garbage collection. They represent a classic trade-off in systems design: doing a little more work on every pointer write to save much more work during collection pauses. This trade-off is continuously refined by runtime engineers to achieve the best possible performance for modern, memory-intensive applications .